Introduction

Dataset

Description of Variables

“First Attempts” Geo Plots

Baseline EDA of Income and Population

Population Histogram and QQ

Income Histogram and QQ

## [1] "26139.73"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     128   18776   24730   26140   32247   56040    3589
## [1] 10274.98

Individual EDA of independent variables.

## Observations per group: 18417, 17756, 22780, 13244. 1064 missing.
##  Factor w/ 4 levels "[-0.827,-0.256]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.8272 -0.2556  0.0152 -0.0866  0.1376  0.3550    1064

## Observations per group: 18061, 18704, 20888, 14544. 1064 missing.
##  Factor w/ 4 levels "[-0.0444,0.0135]",..: 1 1 1 1 1 1 1 1 1 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.0444  0.0135  0.0803  0.0641  0.1064  0.2450    1064

## Observations per group: 19180, 17739, 17836, 17442. 1064 missing.
##  Factor w/ 4 levels "[-0.457,-0.223]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.4569 -0.2228 -0.0737 -0.1511 -0.0322  0.0715    1064

## Observations per group: 18164, 19958, 16580, 17495. 1064 missing.
##  Factor w/ 4 levels "[-0.37,-0.0602]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.3702 -0.0602  0.0634  0.0646  0.1767  0.4024    1064

## Observations per group: 18440, 17906, 18395, 17456. 1064 missing.
##  Factor w/ 4 levels "[-0.814,-0.102]",..: 2 2 2 2 2 2 2 2 2 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.8136 -0.1021  0.0652 -0.0224  0.1637  0.4614    1064

## Observations per group: 18765, 18330, 18095, 17970. 101 missing.
##  Factor w/ 4 levels "[0,5.1]","(5.1,7.7]",..: 2 4 2 3 1 3 3 3 3 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.100   7.700   9.028  11.400 100.000     101

## Observations per group: 18379, 18323, 18173, 18281. 105 missing.
##  Factor w/ 4 levels "[0,24.1]","(24.1,32.6]",..: 3 1 2 2 4 2 1 3 2 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0    24.1    32.6    34.8    43.8   100.0     105

## Observations per group: 18303, 18828, 17766, 18259. 105 missing.
##  Factor w/ 4 levels "[0,20.1]","(20.1,23.8]",..: 2 2 2 3 1 4 3 4 3 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   20.10   23.80   23.95   27.50  100.00     105

## Observations per group: 18672, 18068, 18275, 18141. 105 missing.
##  Factor w/ 4 levels "[0,13.5]","(13.5,17.9]",..: 2 4 4 3 2 2 4 1 2 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0    13.5    17.9    19.1    23.6   100.0     105

## Observations per group: 18588, 18209, 18113, 18246. 105 missing.
##  Factor w/ 4 levels "[0,5]","(5,8.4]",..: 3 3 3 3 1 2 3 2 2 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.000   8.400   9.295  12.500 100.000     105

## Observations per group: 18566, 18226, 18085, 18279. 105 missing.
##  Factor w/ 4 levels "[0,7.1]","(7.1,11.8]",..: 3 4 3 3 3 3 3 1 3 4 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    7.10   11.80   12.86   17.40  100.00     105

## Observations per group: 19155, 17839, 18195, 17967. 105 missing.
##  Factor w/ 4 levels "[0,3.6]","(3.6,5.5]",..: 2 3 4 1 2 3 1 3 2 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.600   5.500   6.227   8.100 100.000     105

## Observations per group: 18430, 18252, 18303, 18276. 0 missing.
##  Factor w/ 4 levels "[0,0.7]","(0.7,3.7]",..: 3 4 4 2 4 3 4 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.70    3.70   13.27   14.40  100.00

## Observations per group: 18472, 18246, 18233, 18310. 0 missing.
##  Factor w/ 4 levels "[0,2.4]","(2.4,7]",..: 1 1 1 3 1 3 2 1 1 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.40    7.00   16.86   20.40  100.00

## Observations per group: 20124, 17253, 17651, 18233. 0 missing.
##  Factor w/ 4 levels "[0,0.2]","(0.2,1.4]",..: 2 3 2 1 3 1 1 1 1 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.20    1.40    4.59    4.80   91.30

## Observations per group: 18351, 18331, 18285, 18294. 0 missing.
##  Factor w/ 4 levels "[0,39.4]","(39.4,71.4]",..: 3 2 3 3 2 3 3 3 4 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   39.40   71.40   62.03   88.30  100.00

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.7279   0.4000 100.0000

Correlations

!! full2015_varofInterest

## corrplot 0.84 loaded

Scatter PLots

ANOVA

!! full_2015_varOfInterest

## Call:
##    aov(formula = IncomePerCap ~ EthnicPlurality, data = anova_dat)
## 
## Terms:
##                 EthnicPlurality    Residuals
## Sum of Squares     1.632113e+12 5.681161e+12
## Deg. of Freedom               4        69566
## 
## Residual standard error: 9036.911
## Estimated effects may be unbalanced
## 3589 observations deleted due to missingness
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "na.action"     "contrasts"     "xlevels"       "call"         
## [13] "terms"         "model"
##                    Df    Sum Sq   Mean Sq F value Pr(>F)    
## EthnicPlurality     4 1.632e+12 4.080e+11    4996 <2e-16 ***
## Residuals       69566 5.681e+12 8.167e+07                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3589 observations deleted due to missingness

## Call:
##    aov(formula = IncomePerCap ~ WorkPlurality, data = anova_dat)
## 
## Terms:
##                 WorkPlurality    Residuals
## Sum of Squares   2.642140e+12 4.671134e+12
## Deg. of Freedom             6        69564
## 
## Residual standard error: 8194.432
## Estimated effects may be unbalanced
## 3589 observations deleted due to missingness
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "na.action"     "contrasts"     "xlevels"       "call"         
## [13] "terms"         "model"
##                  Df    Sum Sq   Mean Sq F value Pr(>F)    
## WorkPlurality     6 2.642e+12 4.404e+11    6558 <2e-16 ***
## Residuals     69564 4.671e+12 6.715e+07                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3589 observations deleted due to missingness

Chi Squared Tests

Work Ethnicity Total
Asian Black Hispanic Native White
Construction 0
18
31
104
639
137
1
3
358
767
1029
1029
Error 0
0
0
0
0
0
0
0
0
0
0
0
Office 128
192
1562
1087
2197
1424
17
30
6811
7982
10715
10715
Production 19
70
480
398
993
521
6
11
2422
2920
3920
3920
Professional 927
851
2345
4803
2506
6294
114
131
41474
35287
47366
47366
SelfEmployed 0
0
0
1
1
2
0
0
11
9
12
12
Service 240
174
2744
983
3273
1288
49
27
3388
7222
9694
9694
Unemployment 0
8
257
43
113
56
15
1
39
316
424
424
Total 1314
1314
7419
7419
9722
9722
202
202
54503
54503
73160
73160
χ2=NaN · df=28 · Cramer’s V=NaN · Fisher’s p=0.000

observed values
expected values

Regression

##Exhaustive Search

## Reordering variables and trying again:

##  [1] "np"        "nrbar"     "d"         "rbar"      "thetab"    "first"    
##  [7] "last"      "vorder"    "tol"       "rss"       "bound"     "nvmax"    
## [13] "ress"      "ir"        "nbest"     "lopt"      "il"        "ier"      
## [19] "xnames"    "method"    "force.in"  "force.out" "sserr"     "intercept"
## [25] "lindep"    "reorder"   "nullrss"   "nn"        "call"

## [1] 14

## [1] 13

## [1] 12

Forward Selection

## Reordering variables and trying again:

Backward Selection

Now backwards (nvmax=10 and nbest=2)

## Reordering variables and trying again:

Sequential Replacement seqrep

## Reordering variables and trying again:

Conclusion